Handling Inconsistency for Multi-Source Integration
نویسندگان
چکیده
The overwhelming amount of information sources now available through the internet has increased the need to combine or integrate the data retrieved from these sources in an intelligent and efficient manner. A desirable approach for information integration would be to have a single interface, like the SIMS information broker [1], which allows access to multiple information sources. An example application is to retrieve all the menus of restaurants from Joe’s Favorite Restaurants site which have been rated highly by the Department of Health. This task, if performed manually, would require a significant amount of work for the user. An information broker, like SIMS, would allow access to multiple information sources, abstracting away the need for the user to know the location or query access methods of any particular source. SIMS stores knowledge about the data contained in each of these sources, as well as the relationships between the sources in the form of a model, called a domain model. The first step in creating the domain model is to determine which data instances appear in multiple sources, e.g. which restaurants from Joe’s web site, like “Art’s Deli,” also appears on the Health Department’s site. Once the common data instances are determined, the relationship between the data in the sources can be modeled in the domain model as subset, superset, equality, or overlapping A special case for information integration is when data instances can exist in inconsistent formats across several sources, e.g. the restaurant “Art’s Deli” can appear as “Art’s Delicatessen” in another source. For the integration process each source can be seen as a relation; therefore, integrating the sources requires performing a join on the two relations by comparing the instances of the primary keys. Since the instances have inconsistent formats, some mapping information is needed to map one instance to another e.g. (“Art’s Deli” from Joe’s source to “Art’s Delicatessen” from the Department of Health site). This information can be stored in the form of a mapping table, or as a mapping function, if a compact translation can be found to accurately convert data instances from one source into another. Once a mapping construct is created it can be modeled as a new information source. This Copyright © 1998, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. integration technique allows SIMS to properly integrate data across several sources that contain inconsistent data instances. Presently, mapping constructs are generated manually, but we are developing a semi-automate approach. The figure contains tuples from Joe’s Restaurant source and the matching tuples from the Health Department:
منابع مشابه
Geo-Web Service Tool for Spatial Data Integrability
The integration of multi-source heterogeneous spatial data is one of the major challenges for many spatial data users. Users put much effort to identify and overcome inconsistency among data sets through a timeconsuming and costly process. Spatial applications that rely on multi-source heterogeneous data also suffer from the lack of automatic mechanism to identify the inconsistency items and as...
متن کاملA Solution for Data Inconsistency in Data Integration
Data integration is a problem of combining data residing at different sources and providing the user with a unified view of these data. An important issue in data integration is the possibility of conflicts among the different data sources. Data sources may conflict with each other at data value level which is defined as data inconsistency. So in this paper, a solution for data inconsistency in...
متن کاملInconsistency-Tolerance in Knowledge-Based Systems by Dissimilarities
Distance-based reasoning is a well-known approach for defining non-monotonic and paraconsistent formalisms, which so far has been mainly used in the context of standard two-valued semantics. In this paper, we extend this approach to arbitrary denotational semantics by considering dissimilarities, a generalization of distances. Dissimilarity-based reasoning is then applied for handling inconsist...
متن کاملLearning to Handle Inconsistency for Multi-Source Integration
Many problems arise when trying to integrate information from multiple sources on the web. One of these problems is that data instances can exist in inconsistent formats across several sources. An example application of information integration is trying to integrate all the reviews of Los Angeles restaurants from Yahoo’s Restaurants webpage with the current health rating for each restaurant fro...
متن کاملFinding Explanations of Inconsistency in Multi-Context Systems
We provide two approaches for explaining inconsistency in multi-context systems, where decentralized and heterogeneous system parts interact via nonmonotonic bridge rules. Inconsistencies arise easily in such scenarios, and nonmonotonicity calls for specific methods of inconsistency analysis. Both our approaches characterize inconsistency in terms of involved bridge rules: either by pointing ou...
متن کاملEnabling Spatial Data Sharing through Multi-source Spatial Data Integration
The dynamic environment of SDIs and the involvement of diverse spatial data providers present uncertainty for involving organizations. This pushes organizations to focus on cooperative data sharing relationships to deliver their objectives. Spatial data sharing provides transactions in which individuals, governments and businesses obtain access to spatial data and services from other stakeholde...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998